Are past periods of economic downturn associated with other factors such as the national rate of unemployment, the fed funds rate, money supply, among other factors? I decided to take a closer look, and ultimately build a rough prediction model to determine (very roughly) when the next recession may be.
In [292]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import pandas.io.data as web
import datetime
import numpy as np
from scipy import stats
from patsy import dmatrices
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn.cross_validation import cross_val_score
#start, end times used as parameters when pulling data from FRED.
start = datetime.datetime(1960, 1,1)
end= datetime.datetime(2016,5,11)
In [293]:
gdp=web.DataReader("GDPC1",'fred',start,end)
#real GDP quarterly
gdpchange=web.DataReader("A191RL1Q225SBEA",'fred',start,end)
#real GDP change from preceding period quarterly
CPIurban=web.DataReader("CPIAUCSL",'fred',start,end)
#CPI quarterly, monthly, urban consumers
fedfunds=web.DataReader("FEDFUNDS",'fred',start,end)
#monthly fed funds rate
unemployment=web.DataReader("UNRATE",'fred',start,end)
#monthly unemployment rate
civpart=web.DataReader("CIVPART",'fred',start,end)
#monthly laborforce participation
tenyear=web.DataReader("GS10",'fred',start,end)
#monthly fed funds rate
m2v=web.DataReader("M2V",'fred',start,end)
#monthly m2 velocity
industrial=m2v=web.DataReader("INDPRO",'fred',start,end)
#monthly industrial production index
The initial problem I encountered was that many of the datasets I collected from FRED did not share the same timescale, preventing any analysis. To fix this, I resampled each dataset to fall in line with the quarterly GDP reporting time scale, and truncated as necessary so all the dates matched.
In addition, for each dataset, I figured it would also be useful to add a new rate of change column for each dataset.
In [294]:
#Quarterly TimeGrouper
g=pd.TimeGrouper(freq='Q')
#Resampling monthly data into quarterly using the TimeGrouper
gdp3m=gdp.resample('Q',how='mean')
gdpchange3m=gdpchange.resample('Q',how='mean')
gdpchange3m=gdpchange3m[1:]
gdpchange3m.columns=['GDP_growth']
#For many of the datasets, including CPI, I created a new rate of change column.
CPIurban3m = CPIurban.groupby(g).mean()
CPIurban3m.columns=['CPI']
CPIurban3m['rate']=CPIurban3m['CPI'].pct_change()
CPIurban3m=CPIurban3m[1:]
unemployment3m=unemployment.groupby(g).mean()
unemployment3m.columns=['Unemployment Rate']
unemployment3m['Percent Change']=unemployment3m['Unemployment Rate'].pct_change()
unemployment3m=unemployment3m[1:]
fedfunds3m=fedfunds.groupby(g).mean()
fedfunds3m.columns=['FedFundsRate']
fedfunds3m['Percent Change']=fedfunds3m['FedFundsRate'].pct_change()
fedfunds3m=fedfunds3m[1:]
civpart3m=civpart.groupby(g).mean()
civpart3m.columns=['CivLabourPart']
civpart3m['Percent Change']=civpart3m['CivLabourPart'].pct_change()
civpart3m=civpart3m[1:]
tenyear3m=tenyear.groupby(g).mean()
tenyear3m.columns=['10YearBond']
tenyear3m['Percent Change']=tenyear3m['10YearBond'].pct_change()
tenyear3m=tenyear3m[1:]
m2v3m=m2v.groupby(g).mean()
m2v3m.columns=['M2Velocity']
m2v3m['Percent Change']=m2v3m['M2Velocity'].pct_change()
m2v3m=m2v3m[1:]
industrial3m=industrial.groupby(g).mean()
industrial3m.columns=['Production Index']
industrial3m['Percent Change']=industrial3m['Production Index'].pct_change()
industrial3m=industrial3m[1:]
In [295]:
final=gdpchange3m
final['CPI_growth']=CPIurban3m['rate']
final['Unemployment_Rate']=unemployment3m['Unemployment Rate']
final['Unemployment_Rate_growth']=unemployment3m['Percent Change']
final['Fed_Funds_Rate']=fedfunds3m['FedFundsRate']
final['Fed_Funds_Rate_growth']=fedfunds3m['Percent Change']
final['Civ_Labour_Part']=civpart3m['CivLabourPart']
final['Civ_Labour_Part_growth']=civpart3m['Percent Change']
final['Ten_Year_Bond']=tenyear3m['10YearBond']
final['Ten_Year_Bond_growth']=tenyear3m['Percent Change']
final['M2_Velocity']=m2v3m['M2Velocity']
final['M2_Velocity_growth']=m2v3m['Percent Change']
final['Production_Index_growth'] = industrial3m['Percent Change']
rowCount=len(final.index)
Final dataset looks good. No alignment issues here.
In [296]:
final
Out[296]:
For the sake of analysis, I realized I would also need to create a similar dataset where each row corresponds to a recession year. This way, I could compare each indicator's performance across all years with performance only in recession years.
In [297]:
finalOnlyNegatives = pd.DataFrame(np.nan, index=[0], columns=final.columns)
for index, row in final.iterrows():
if(row['GDP_growth'] < 0):
finalOnlyNegatives.loc[finalOnlyNegatives.index.max()+1]=row[0],row[1],row[2],row[3],row[4],row[5],row[6],row[7],row[8],row[9],row[10],row[11],row[12]
finalOnlyNegatives=finalOnlyNegatives[1:]
finalRowCount=len(finalOnlyNegatives.index)
finalOnlyNegatives
Out[297]:
Look good. Out of 224 recorded quarters, there are only 27 quarters of negative GDP growth.
Now lets dive deeper and see how the indicators differ in times of recession and across all quarters. The methodology is as follows:
For each indicator, determine the % of the time where growth is positive from the initial dataframe containing all records. This is where the rate of change column I created for each indicator comes in handy.
I can then compare this percentage with the percentage of the time where the indicator's growth is positive from the negative-GDP-growth-only data frame.
For the sake of statistical soundness, I also perform a 2-tailed Proportion p-test to determine which differences are statistically significant.
In [298]:
#cpi % positive growth across all quarters, compared with
CPIcount=0
for index, row in final.iterrows():
if(row['CPI_growth'] > 0):
CPIcount=CPIcount+1
CPIpercent = (CPIcount/rowCount)
#CPI % positive growth across negative GDP growth quarters.
newCPICount = 0
for index, row in finalOnlyNegatives.iterrows():
if(row['CPI_growth'] > 0):
newCPICount=newCPICount+1
newCPIpercent = (newCPICount/finalRowCount)
#CPI 2-tailed Proportion p-test of significance
CPIz=np.array([[rowCount,CPIcount],[finalRowCount,newCPICount]])
CPIp=(stats.chi2_contingency(CPIz)[1])
#The same is done for each of the indicators
#unemployment rate growth average and negative-only
unemploymentCount=0
for index, row in final.iterrows():
if(row['Unemployment_Rate_growth'] > 0):
unemploymentCount=unemploymentCount+1
unemploymentPercent = (unemploymentCount/rowCount)
newUnemploymentCount = 0
for index, row in finalOnlyNegatives.iterrows():
if(row['Unemployment_Rate_growth'] > 0):
newUnemploymentCount=newUnemploymentCount+1
newUnemploymentPercent = (newUnemploymentCount/finalRowCount)
unemploymentz=np.array([[rowCount,unemploymentCount],[finalRowCount,newUnemploymentCount]])
unemploymentp=(stats.chi2_contingency(unemploymentz)[1])
#fed funds rate average and negative-only
fedfundscount=0
for index, row in final.iterrows():
if(row['Fed_Funds_Rate_growth'] > 0):
fedfundscount=fedfundscount+1
fedfundspercent = (fedfundscount/rowCount)
newfedfundscount = 0
for index, row in finalOnlyNegatives.iterrows():
if(row['Fed_Funds_Rate_growth'] > 0):
newfedfundscount=newfedfundscount+1
newFedFundsPercent = (newfedfundscount/finalRowCount)
fedfundsz=np.array([[rowCount,fedfundscount],[finalRowCount,newfedfundscount]])
fedfundsp=(stats.chi2_contingency(fedfundsz)[1])
#civ labour participation rate average and negative-only
civlaborcount=0
for index, row in final.iterrows():
if(row['Civ_Labour_Part_growth'] > 0):
civlaborcount=civlaborcount+1
civlabourpercent = (civlaborcount/rowCount)
newcivlaborcount = 0
for index, row in finalOnlyNegatives.iterrows():
if(row['Civ_Labour_Part_growth'] > 0):
newcivlaborcount=newcivlaborcount+1
newcivlaborpercent = (newcivlaborcount/finalRowCount)
civlabourz=np.array([[rowCount,civlaborcount],[finalRowCount,newcivlaborcount]])
civlabourp=(stats.chi2_contingency(civlabourz)[1])
#ten year bond rate average and negative-only
tenyearcount=0
for index, row in final.iterrows():
if(row['Ten_Year_Bond_growth'] > 0):
tenyearcount=tenyearcount+1
tenyearpercent = (tenyearcount/rowCount)
newtenyearcount = 0
for index, row in finalOnlyNegatives.iterrows():
if(row['Ten_Year_Bond_growth'] > 0):
newtenyearcount=newtenyearcount+1
newtenyearpercent = (newtenyearcount/finalRowCount)
tenyearz=np.array([[rowCount,tenyearcount],[finalRowCount,newtenyearcount]])
tenyearp=(stats.chi2_contingency(tenyearz)[1])
#M2 velocity average and negative-only
m2count=0
for index, row in final.iterrows():
if(row['M2_Velocity_growth'] > 0):
m2count=m2count+1
m2percent = (m2count/rowCount)
newm2count = 0
for index, row in finalOnlyNegatives.iterrows():
if(row['M2_Velocity_growth'] > 0):
newm2count=newm2count+1
newm2percent = (newm2count/finalRowCount)
m2z=np.array([[rowCount,m2count],[finalRowCount,newm2count]])
m2p=(stats.chi2_contingency(m2z)[1])
#Production Index Growth average and negative-only
productioncount=0
for index, row in final.iterrows():
if(row['Production_Index_growth'] > 0):
productioncount=productioncount+1
productionpercent = (productioncount/rowCount)
newproductioncount = 0
for index, row in finalOnlyNegatives.iterrows():
if(row['Production_Index_growth'] > 0):
newproductioncount=newproductioncount+1
newproductionpercent = (newproductioncount/finalRowCount)
productionz=np.array([[rowCount,productioncount],[finalRowCount,newproductioncount]])
productionp=(stats.chi2_contingency(productionz)[1])
Now let's create a new dataframe to store the results.
In [299]:
beforeAfterCompare = pd.DataFrame(np.nan, index=[0], columns=final.columns)
beforeAfterCompare=beforeAfterCompare.drop('GDP_growth',axis=1)
beforeAfterCompare=beforeAfterCompare.drop('Fed_Funds_Rate',axis=1)
beforeAfterCompare=beforeAfterCompare.drop('Unemployment_Rate',axis=1)
beforeAfterCompare=beforeAfterCompare.drop('Civ_Labour_Part',axis=1)
beforeAfterCompare=beforeAfterCompare.drop('Ten_Year_Bond',axis=1)
beforeAfterCompare=beforeAfterCompare.drop('M2_Velocity',axis=1)
beforeAfterCompare['Type']=""
beforeAfterCompare.loc[0]=CPIpercent,unemploymentPercent,fedfundspercent,civlabourpercent,tenyearpercent,m2percent,productionpercent,'All Quarters'
beforeAfterCompare.loc[1]=newCPIpercent,newUnemploymentPercent,newFedFundsPercent,newcivlaborpercent,newtenyearpercent,newm2percent,newproductionpercent,'Recession Quarters'
beforeAfterCompare=beforeAfterCompare.set_index('Type')
beforeAfterCompare
Out[299]:
Looks good! With this dataframe we can clearly identify some indicators whose positive growth rates differs significantly when GDP growth is negative and across all quarters.
Namely, the unemployment rate increases only 34% of the time across all periods, but increases 81% of the time in times of recession. Other factors, such as the fed funds rate and the M2 Velocity also differ substantially.
Other factors differ moderately, or barely at all such as with CPI growth.
Just to confirm that these changes are significant, lets see what the p-values are.
In [300]:
significance = pd.DataFrame(np.nan, index=[0], columns=beforeAfterCompare.columns)
significance.loc[0]=CPIp,unemploymentp,fedfundsp,civlabourp,tenyearp,m2p,productionp,
significance
Out[300]:
Looks good. This confirms that those factors that differ substantially have low p-values, which signifies statistical significance. (Under .05 is a good cut off here.)
In [301]:
fig, ax = plt.subplots(figsize=(18,10))
beforeAfterCompare.plot(ax=ax,kind='bar',title='Differences in Key Indicators During Recession')
Out[301]:
In addition to comparing the percentage of the time where the indicator experiences positive growth, we can also compare the average value of the indicator across all time periods vs periods of negative GDP growth.
This analysis is split into two graphs in order to better see the differences.
In [302]:
final=final.reset_index()
final['Recession']=(final.GDP_growth < 0).astype(float)
finalvalues=final.groupby('Recession').mean()
finalvalues=finalvalues.drop("GDP_growth",axis=1)
finalvalues=finalvalues.drop("CPI_growth",axis=1)
finalvalues=finalvalues.drop("Unemployment_Rate_growth",axis=1)
finalvalues=finalvalues.drop("Fed_Funds_Rate_growth",axis=1)
finalvalues=finalvalues.drop("Civ_Labour_Part_growth",axis=1)
finalvalues=finalvalues.drop("Ten_Year_Bond_growth",axis=1)
finalvalues=finalvalues.drop("M2_Velocity_growth",axis=1)
finalvalues=finalvalues.drop("Production_Index_growth",axis=1)
finalvalues2=finalvalues
finalvalues=finalvalues.drop("Civ_Labour_Part",axis=1)
finalvalues=finalvalues.drop("M2_Velocity",axis=1)
finalvalues
Out[302]:
In [303]:
fig, ax = plt.subplots(figsize=(18,10))
finalvalues.plot(ax=ax,kind='bar',title='Differences in Key Indicators During Recession')
Out[303]:
In [304]:
finalvalues2=finalvalues2.drop("Unemployment_Rate",axis=1)
finalvalues2=finalvalues2.drop("Fed_Funds_Rate",axis=1)
finalvalues2=finalvalues2.drop("Ten_Year_Bond",axis=1)
finalvalues2
Out[304]:
In [305]:
fig, ax = plt.subplots(figsize=(18,10))
finalvalues2.plot(ax=ax,kind='bar',title='Differences in Key Indicators During Recession',ylim=(50,70))
Out[305]:
I created a train/test logistic regression model using SKlearn, which ultimately resulted in a moderately successful model. Since negative gdp growth occured in only 12% of periods, the null success rate here is 88%. If one guessed that every period would have no recession, they would be right 88% of the time.
This model is 92.6% accurate, which is little more than a third better than just guessing. This goes to show that it is difficult to create a prediction model for recessions from these indicators alone. Still, its better than nothing.
In [306]:
y, X = dmatrices('Recession ~ CPI_growth + Unemployment_Rate + Unemployment_Rate_growth + Fed_Funds_Rate + Civ_Labour_Part + Ten_Year_Bond+ M2_Velocity', final,return_type="dataframe")
In [307]:
y = np.ravel(y)
In [308]:
#this is the % of the time negative gdp growth occured.
y.mean()
Out[308]:
In [309]:
#creating test, train data sets, with test size of 30%.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=2)
model = LogisticRegression()
model.fit(X_train, y_train)
Out[309]:
In [310]:
# predict class labels for the test set
predicted = model.predict(X_test)
print (predicted)
#making sure there is an occurance of negative GDP growth in the test set.
In [315]:
# This is the accuracy, 92.6%
print (metrics.accuracy_score(y_test, predicted))
Predicting GDP decline is hard stuff! Its obvious that there is a correlation with certain indicators, such as unemployment, the fed funds rate, and M2 velocity, but this is still just correlation.
Still, combining the moderately successful model with an awareness of the current status of these indicators would be a pretty good bet. If the model is predicting a period of negative GDP growth, and the correlated indicators are also in line with recession values, its likely a recession is right around the corner.